Scalable and Efficient Neural Speech Coding: A Hybrid Design

نویسندگان

چکیده

We present a scalable and efficient neural waveform coding system for speech compression. formulate the problem as an autoencoding task, where convolutional network (CNN) performs encoding decoding codec (NWC) during its feedforward routine. The proposed NWC also defines quantization entropy trainable module, so artifacts bitrate control are handled optimization process. achieve efficiency by introducing compact model components to NWC, such gated residual networks depthwise separable convolution. Furthermore, models with architecture, cross-module learning (CMRL), cover wide range of bitrates. To this end, we employ concept concatenate multiple modules, each module restore any reconstruction loss that preceding modules have created. CMRL can scale down lower bitrates well, which it employs linear predictive (LPC) first autoencoder. hybrid design integrates LPC redefining LPC's differentiable process, making training end-to-end manner. decoder is either one (0.12 million parameters) in low medium ranges (12 20 kbps) or two NWCs high (32 kbps). Although complexity not yet conventional codecs, significantly reduced from other coders, WaveNet-based vocoder. For wide-band quality, our yields comparable superior performance AMR-WB Opus on TIMIT test utterances at up higher near transparent performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient scalable speech compression for scalable speech recognition

We propose a scalable recognition system for reducing recognition complexity. Scalable recognition can be combined with scalable compression in a distributed speech recognition (DSR) application to reduce both the computational load and the bandwidth requirement at the server. A low complexity preprocessor is used to eliminate the unlikely classes so that the complex recognizer can use the redu...

متن کامل

Efficient and universal scalable video coding

This paper proposes a unified efficient and universal scalable video coding framework that supports different scalabilities, such as fine granularity quality, temporal, spatial and complexity scalabilities. The proposed framework is established upon the recent studies in fine granularity scalable (FGS) video coding. It contains two key points. Firstly, in order to improve the coding efficiency ...

متن کامل

A Scalable and Efficient Design of WebSignSys

WebSignSys is an infrastructure which bridges the physical world and the virtual world (World Wide Web). It provides a form of augmented reality by creating, mapping, delivering, and detecting websigns (introduced in [7]), which are defined as hyperlinks from physical locations to web resources. In WebSignSys design, there are many challenges, such as scalability, efficiency, etc. In this repor...

متن کامل

An efficient and scalable 2D DCT-based feature coding scheme for remote speech recognition

A 2D DCT-based approach to compressing acoustic features for remote speech recognition applications is presented. The coding scheme involves computing a 2D DCT on blocks of feature vectors followed by uniform scalar quantization, run-length and Huffman coding. Digit recognition experiments were conducted in which training was done with unquantized cepstral features from clean speech and testing...

متن کامل

Memory Efficient Scalable Line-based Image Coding

We study the problem of memory-e cient scalable image compression and investigate some tradeo s in the complexity vs. coding e ciency space. The focus is on a low-complexity algorithm centered around the use of sub-bit-planes, scan-causal modeling, and a simpli ed arithmetic coder. This algorithm approaches the lowest possible memory usage for scalable wavelet-based image compression and demons...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE/ACM transactions on audio, speech, and language processing

سال: 2022

ISSN: ['2329-9304', '2329-9290']

DOI: https://doi.org/10.1109/taslp.2021.3129353